Tiling for Parallel Execution - Optimizing Node Cache Performance
ثبت نشده
چکیده
منابع مشابه
Parallel Processing Letters C World Scientiic Publishing Company Tiling for Parallel Execution { Optimizing Node Cache Performance
Tiling has been used by parallelizing compilers to deene ne-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining tile size that achieves the highest cache hit-rate. The widening disparity between the processor's peak instruction rate and the main memory access time in modern processo...
متن کاملA Stable and Efficient Loop Tiling Algorithm
Loop tiling is an effective optimizing transformation to boost the memory performance of a program, especially for dense matrix scientific computations. The magnitude and stability of the achieved performance improvements is heavily dependent on the appropriate selection of tile sizes. Many existing tile selection algorithms try to find tile sizes which eliminate self-interference cache conflic...
متن کاملCacheminer: A Runtime Approach to Exploit Cache Locality on SMP
ÐExploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted ar...
متن کاملLoop Transformations for Parallel Execution of a Class of Nested Loops on Shared-Memory Multiprocessors
Computationally intensive multi-dimensional integrals involving products of several arrays arise in some computational physics codes modeling electronic properties of semiconductors. This paper develops a framework for optimizing the parallel execution on shared-memory multiprocessors, of a class of nested loop computations motivated by this application domain. The framework addresses the selec...
متن کاملA memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout oriented approach to exploit cache locality for parallel loops at run-time on Symmetric Multi-Processor (SMP) systems. Guided by applicationdependent hints and the targeted cache architecture, it reorganizes and partit...
متن کامل